Hierarchical Categorisation of Web Tags for Delicious
نویسندگان
چکیده
ion that enables us to conclude, directly from the inspection of the user profile, that these users are interested in politics. In this sense, the categorization of tags allows us represent user profiles in a tractable manner, on the basis a reduced set of meaningful categories of interest. Motivated by this, next we proceed to describe the methodology that we followed to categorise the Delicious data set retrieved by the Distributed Artificial Intelligence Laboratory (DAILabor), at Technische Universität Berlin [1]. This methodology, explained in what follows, is in line with other works in this field [2], [3]. The data set in question is organized in the form of triples (username, bookmark, tag), each one modelling the action of a user associating a bookmark with a tag. To accomplish J. Parra Arnau, J. Forné and D. Rebollo-Monedero are with the Department of Telematics Engineering, Universitat Politècnica de Catalunya, C.\ Jordi Girona 1-3, E-08034 Barcelona, Catalonia. E-mail: {javier.parra,jforne,david.rebollo}@entel.upc.edu A. Perego is with the Institute for Environment and Sustainability of the Joint Research Centre, European Commission, I-21027 Ispra, Italy. E-mail: [email protected]. E. Ferrari is with the Department of Theoretical and Applied Science, University of Insubria, Via Mazzini 5, I-21100 Varese, Italy. E-mail: [email protected]. HIERARCHICAL CATEGORISATION OF WEB TAGS FOR DELICIOUS 2/23 this categorisation, first we carried out some preprocessing to filter out those tags considered as spam. For this purpose, we collected some statistics about the number of characters contained by tags. After observing that 98% of tags had less than 23 characters, we dropped those tags with a number of characters over 22. In addition, we eliminated those posts with more than 50 tags, as they are usually spam [2]. Additionally, posts with no tags were not considered. After this simple preprocessing, the number of triples reduced to 1,149,895, and, consequently, the number of users, bookmarks and tags to 9,207, 349,658 and 54,024,
منابع مشابه
Eulerview with Projections: Non Hierarchical Visualisation
EulerView is a novel resource management tool, enabling the representation of non-hierarchical categorisation structures within which to place resources. Since the initial incarnation for use in file-system management, it has been integrated with other systems, such as Flickr to produce Eulr, which assists in user manipulation of photo tags. Another system currently under development is Eulicio...
متن کاملComparing Tweets and Tags for URLs
The free-form tags available from social bookmarking sites such as Delicious have been shown to be useful for a number of purposes and could serve as a cheap source of metadata about URLs on the web. Unfortunately recent years have seen a reduction in the popularity of such sites, however at the same time microblogging sites such as Twitter have exploded in popularity. On these sites users subm...
متن کاملFacette: Using Facets to Improve Tag-based Bookmarking
Facette is a web service that uses facets to enhance the organizational capabilities of tag-based bookmarking systems. As with other bookmarking services, Facette allows users to associate tags with bookmarks to assist the retrieval of information. Facette also allows users to classify tags through use of facets. To create these facets, Facette introduces a method of facet creation called free ...
متن کاملFind, New, Copy, Web, Page - Tagging for the (Re-)Discovery of Web Pages
Abstract. The World Wide Web has a very dynamic character with resources constantly disappearing and (re-)surfacing. A ubiquitous result is the “404 Page not Found” error as the request for missing web pages. We investigate tags obtained from Delicious for the purpose of rediscovering such missing web pages with the help of search engines. We determine the best performing tag based query length...
متن کاملSemantic Disambiguation and Contextualisation of Social Tags1
We present an algorithmic framework to accurately and efficiently identify the semantic meanings and contexts of social tags within a particular folksonomy. The framework is used for building contextualised tag-based user and item profiles. We also present its implementation in a system called cTag, with which we preliminary analyse semantic meanings and contexts of tags belonging to Delicious ...
متن کامل